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Abstract 

We  describe  an  approach  to  large-scale  construction  of  a  semantic  lexicon  for  Chinese 
verbs.  We  leverage  off  of  three  existing  resources —  a  classification  of  English  verbs 
called  EVCA  (English  Verbs  Classes  and  Alternations)  [Levine,  1993],  a  Chinese  concep¬ 
tual  database  called  HowNet  (Zhendong,  1988c,  Zhengdong,  1988b,  Zhendong 
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Abstract:  We  describe  an  approach  to  large-scale  constrnction  of  a  semantic  lexicon  for  Chi¬ 
nese  verbs.  We  leverage  off  of  three  existing  resonrces — a  classihcation  of  English  verbs  called 
EVCA  (English  Verbs  Classes  and  Alternations)  [Levin,  1993],  a  Chinese  conceptnal  database  called 
HowNet  [Zhendong,  1988c,  Zhendong,  1988b,  Zhendong,  1988a]  (http://www.how-net.com),  and  a 
large  machine-readable  dictionary  called  Optilex.  The  resnlting  lexicon  is  nsed  for  determining  ap¬ 
propriate  word  senses  in  applications  snch  as  machine  translation  and  cross-langnage  information 
retrieval. 
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1  Introduction 


With  the  growing  quantity  of  online  multihngual  information,  automatic  and  semi-automatic  tech¬ 
niques  for  lexical  acquisition  are  more  critical  now  than  ever  before.  We  describe  an  approach  to 
large-scale  construction  of  a  semantic  lexicon  for  Chinese  verbs.  We  leverage  off  of  three  existing 
resources — a  classihcation  of  Enghsh  verbs  called  EVCA  (English  Verbs  Classes  and  Alternations) 

[Levin,  1993],  a  Chinese  conceptual  database  called  HowNet  [Zhendong,  1988c,  Zhendong,  1988b, 

Zhendong,  1988a]  (http://www.how-net.com),  and  a  large  machine  readable  Chinese- English  dic¬ 
tionary  called  Optilex.^ 

Our  approach  involves  extraction  of  candidate  translations  from  Optilex  for  each  of  the  Chinese 
verbs  occurring  in  HowNet.  We  then  create  links  between  Chinese  concepts  and  Enghsh  classes 
using  thematic-role  mappings  between  HowNet  entries  and  EVCA-based  entries.  Each  Chinese- 
Enghsh  link  is  subsequently  associated  with  a  sense  from  WordNet  [Miller  and  Fellbaum,  1991], 
thus  producing  a  new  Asian  companion  to  the  current  (Euro)WordNet  initiative.  The  resulting 
lexicons  are  used  for  determining  appropriate  word  senses  in  applications  such  as  machine  transla¬ 
tion  and  cross-language  information  retrieval. 

Several  researchers  have  investigated  the  problem  of  assigning  class-based  senses  to  verbs  [Dang  et  ah,  1998], 
[Dorr  and  Jones,  1999],  [Dorr  and  Jones,  1996]  [Dorr,  1997],  [Jones  et  ah,  1994],  [Nomura  et  ah,  1994] 

[Olsen  et  ah,  1998],  [Palmer  and  Wu,  1995],  [Palmer  and  Rosenzweig,  1996],  and  [Saint-Dizier,  1996]. 

This  work  extends  the  techniques  described  by  [Palmer  and  Wu,  1995],  which  used  a  concept  space 
to  produce  a  hierarchical  organization  of  Chinese  verbs.  The  extensions  include  the  use  of  the  entire 
EVCA  database  rather  than  a  small  set  of  verbs  (the  break  class)  and  the  provision  of  a  thematic- 
role  based  hlter.  We  adopt  a  technique  that  is  similar  in  flavor  to  the  intersective-class  approach 
of  [Dang  et  ah,  1998],  with  the  following  extensions:  (1)  Concept  alignment  across  two  different 
language  hierarchies  (Chinese  and  English)  rather  than  one;  (2)  Mappings  between  Chinese  and 
Enghsh  thematic  roles;  and  (3)  Hooks  into  WordNet  senses  for  both  languages. 

The  next  section  describes  the  HowNet  conceptual  database.  Following  these,  we  will  describe 
the  approach  we  used  to  produce  the  concept-to-class  correspondence.  Section  4  presents  the  result 
of  our  automatic  acquisition  experiment. 

2  HowNet  Conceptual  Database 

HowNet  is  an  on-line  conceptual  common-sense  knowledge  base  that  contains  hierarchical  informa¬ 
tion  relating  concepts  to  the  associated  Chinese  word.  Our  focus  is  on  the  verb  hierarchy,  which 
has  the  structure  shown  in  Table  1. 

The  number  labels  given  here  are  our  own;  we  use  these  for  indicating  the  level  of  each  concept 
in  the  HowNet  database.  Note  that  the  highest  two  concepts  in  the  verb  hierarchy  are  “static” 

(V.l)  and  “act”  (V.2).  These  correspond,  respectively,  to  verbs  such  as  (become  under  the 

“static”  node  V.l. 1.1)  and  (start  under  the  “act”  node  V.2. 1.1).  The  levels  go  much  deeper 
than  these,  with  the  lowest  ones  at  8  levels  deep,  e.g.,  V.l. 2. 1.6. 3. 3. 1.15  itch. 

Within  each  of  the  HowNet  classes  is  a  thematic-role  specihcation.  For  example,  the  verb  “cure” 
has  the  thematic-role  specihcation  (agent, patient, content, tool).  Consider  the  sentence  The 
doctor  cured  the  man  of  pneumonia  using  antibiotics.  The  roles  in  the  specihcation  have  the 
following  binding,  respectively,  for  this  sentence  :  doctor,  man,  pneumonia,  antibiotics.^  The 

^Optilex  is  the  machine- readable  version  of  the  CETA  dictionary,  licensed  from  the  MRM  corporation,  Kensington, 

MD. 

^Thematic- role  specihcations  and  their  use  in  generation  of  natural-language  translations  are  described  further  in 
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V.l  |static| 

V.2  [act] 

V.2. 4  [AlterState] 

V.1.1  1  relation] 

V.2.1  [ActGeneral] 

V.2. 4.1  ]  AlterPhysical] 

V.l. 1.1  jisaj 

V.2. 1.1  [start] 

V.2. 4. 2  ] AlterStateNormal] 

V.l. 1.2  [possession] 

V.2. 1.2  [do] 

V.2. 4. 3  ] AlterStateGood] 

V.l. 1.3  [comparison] 

V.2. 1.3  [DoNot] 

V.2. 4. 4  j AlterQuantity] 

V.l. 1.4  [suit] 

V.2. 1.4  [Cease] 

V.2. 4. 5  ] AlterStateBad] 

V.l. 1.5  [inclusive] 

V.2.1. 5  [Wait] 

V.2. 4. 6  ] AlterMental] 

V.l. 1.6  [connective] 

V.2. 2  [ActSpecihc] 

V.2. 5  [Alter Attribute]: 

V.l. 1.7  [CauseResult] 

V.2. 2.1  [AlterGeneral] 

V.2. 5.1  [MakeHigher] 

V.l. 1.8  [TimeOrSpace] 

V.2. 2. 2  [AlterSpecihc] 

V.2. 5. 2  [MakeLower] 

V.l. 1.9  [arithmetic] 

V.2. 3  [AlterRelation] 

V.2. 5. 3  [Alter Appearance] 

V.l. 2  [state] 

V.2. 3.1  [Alterlsa] 

V.2. 5. 4  ] AlterMeasurement] 

V.l. 2.1  [StatePhysical] 

V.2. 3. 2  [AlterPossession] 

V.2. 5. 5  ] AlterProperty] 

V.l. 2. 2  [StateMental] 

V.2. 3. 3  [AlterComparison] 

V.2. 3. 4  [AlterPitness] 

V.2. 3. 5  [Alterlnclusion] 

V.2. 3. 6  [AlterConnection] 

V.2. 3. 7  [AlterCauseResult] 

V.2. 3. 8  [AlterLocation] 

V.2. 3. 9  [AlterTimePosition] 

V.2. 6  [MakeAct]: 

V.2. 6.1  [CauseToDo] 

V.2. 6. 2  [CauseNotToDo] 
V.2. 6. 3  [use] 

Table  1:  HowNet  Verb  Hierarchy 


Number  of  EVCA  Classes  per  Concept: 

0 

1 

2 

3 

4 

5 

Number  of  HowNet  Concepts: 

2 

371 

71 

20 

10 

4 

Table  2:  Partitioning  of  HowNet  Concepts  into  EVCA  Classes 


thematic-role  specihcations  are  nsed  for  prioritizing  candidate  HowNet-EVCA  associations,  as  wiU 
be  described  below. 

3  Approach 

We  have  associated  478  Chinese  HowNet  concepts  with  485  EVCA  classes,  demonstrating  a  clear 
concept-to-class  correspondence  in  a  large  majority  of  the  cases. ^  The  mapping  between  Chinese 
HowNet  and  English  EVCA  (hence  WordNet)  involves  three  steps. 

The  hrst  step  is  to  prodnce  all  possible  English  Optilex  glosses  (translations)  for  all  12342 
Chinese  verbs  in  HowNet  and  associate  each  Chinese  verb  with  one  or  more  of  the  478  HowNet 
concepts,  forming  48,884  verb-to-concept  candidates.  For  example,  there  is  a  common  Chinese  verb 
Jli  (la)  that  is  mnltiply  ambignons,  corresponding  to  13  Optilex-based  English  glosses:  slash,  cut, 
chat,  pull,  drag,  transport,  move,  raise,  help,  implicate,  involve,  defecate,  and  pressgang .  This  verb 
is  associated  with  9  HowNet  concepts:  |Transport|,  |Attract|,  |Excrete|,  |Force|,  |Help|,  |lnclnde|, 
|PnU|,  |Recreation|,  and  |Talk|. 

The  second  step  involves  associating  each  verb-to-concept  candidate  with  one  or  more  of  the 
485  EVCA  classes,  forming  an  average  of  2  thonsand  verb-to-class  entries  per  HowNet  concept  (on 
the  order  of  1  million  verb-to-class  candidates,  total).  For  example,  the  Chinese  verb  (la)  is 

[Dorr  et  al.,  1998]. 

^  There  are  actually  more  than  800  concepts  in  HowNet  that  dehne  events.  The  number  was  reduced  to  478  for 
the  purpose  of  this  preliminary  experiment;  a  more  in-depth  acquisition  process  is  currently  underway  to  hll  out  the 
hnal  300+  concepts.  See  [Dorr  et  al.,  2000]. 
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HowNet  Concept 

EVCA  Class(es) 

Transport 

11.1  Send 

Help 

13.4.2  Eqnip 

Apologize 

32. 2. a  Long 

Naming 

29.3  Dnb 

Jndge 

29.4  Declare 

Moisten 

45. 4. a  Change  of  State 

Excrete 

40.1.2  Breathe 

TakeVehicle 

51.4.2.a.ii  Motion  by  Vehicle 

Play  Down 

33. b  Jndgment  (75%),  31. 2. a  Admire  (25%) 

Establish 

29. 2. c  Characterize  (90%),  26. 4. a  Create  (19%) 

Decorate 

9.8.b  Fill  (50%),  26.1.b  Bnild  (43%),  9.9.ii  Bntter  (25%) 

Bny| 

10.5  Steal  (08%),  13.5. l.a  Get  (30%),  13.5.1.b.ii  Get  (54%),  13.5.2.d  Get 
(46%) 

Teach 

29. 2. c  Characterize  (24%),  33. b  Jndgment  (71%),  37. 9. a  Advise  (29%),  37. l.a 
Transfer  Message  (45%),  31. l.a  Amnse  (19%) 

Table  3:  Examples  of  HowNet  Partitionings  with  Respect  to  EVCA 


associated  with  22  EVCA  classes:  Admire  (31. 2. b,  implicate,  involve)',  Amnse  (31.1.b,  transport, 
move,  cut',  Braid  (41.2.2,  cut)',  Breathe  (40.1.2,  defecate)',  Bnild  (26.1. a,  cut)'.  Carry  (11. 4. i,  carry, 
pull,  drag)'.  Chitchat  (37. 6. a,  chat)'.  Crane  (40.3.2,  raise)',  Cnt  (21.1. a,  slash,  cut)',  Cnt  (21.1.d,  cut)', 
Eqnip  (13.4.2,  help)'.  Force  (12.a.ii,  pull)'.  Get  (13. 5.1. a,  pull)'.  Grow  (26.2.a.ii,  raise)',  Hnrt  (40.8.3, 
pull,  cut)'.  Meander  (47. 7. a,  cut)'.  Play  (009,  pawn)',  Pnt  (9. 4. a,  raise)'.  Search  (35. 2. a,  drag)'.  Send 
(11.1,  smuggle,  transport,  ship,  convey)'.  Send  Shde  (11. 2. b,  move)'.  Split  (23. 2. b,  cut,  pull). 

The  hnal  step  is  to  partition  each  HowNet  concept  into  gronps  of  Chinese- English  pairs  whose 
Enghsh  glosses  correspond  to  EVCA  classes.  This  involves  three  snbtasks: 

•  Order  the  candidate  EVCA  classes  so  that  the  highest-ranking  classes  are  those  that  contain 
the  highest  nnmber  of  English  verbs  matching  the  Optilex  glosses. 

•  In  cases  where  a  tie-breaker  is  needed,  reorder  the  candidate  EVCA  classes  according  to  the 
degree  to  which  the  thematic-role  specihcation  in  HowNet  concept  matches  that  of  EVCA 
class. 

•  For  each  Chinese- English  entry  associated  with  the  HowNet  concept,  assign  the  highest  rank¬ 
ing  candidate  EVCA  class. 

Consider  two  HowNet  concepts  associated  with  the  the  Chinese  verb  Jd.  (la):  I  Help  I  and 
|Transport|.  The  thematic-role  specihcation  associated  with  |Help|  is  (agent, patient, scope)  (as 
in  .John  helped  him  with  his  work).  This  specihcation  most  closely  matches  that  of  Eqnip  EVCA 
Class  (where  Jr  (la)  is  translated  as  help)  which  has  the  specihcation  _ag_th,mod-poss(with) ; 
thns,  the  |Help|  HowNet  concept  is  associated  with  the  Eqnip  EVCA  Class,  and  the  mapping 
between  the  two  is  (agent->ag),  (patient->th) ,  (scope->mod-poss) . 

On  the  other  hand,  the  |Transport|  HowNet  concept  is  associated  with  the  thematic-role  spec¬ 
ihcation  (agent,  patient,  Locationini,  LocationFin,  direction)  (as  in  .John  transported  the 
goods  from  Boston  to  New  York  (westward)).  This  specihcation  most  closely  matches  that  of  the 


3 


Send  EVCA  Class  (where  (la)  is  translated  as  transport)',  thns,  the  |Transport| 

Hownet  concept  is  associated  with  the  Send  EVCA  class,  and  the  mapping  between  the  two  is 
(agent->ag),  (patient->th) ,  (LocationIni->src) ,  (LocationFin->goal) . 

The  end  resnlt  is  that  the  English  glosses  associated  with  (la)  are  hltered  down  to  help  in  the 
Eqnip  semantic  class  and  transport  in  the  Send  semantic  class;  the  corresponding  WordNet  senses 
are  assigned  (for  free)  from  the  hand-tagged  EVCA  database.  These  are  Senses  1-3  in  the  case  of 
transport  (i.e.,  move/carry/displace)  and  Sense  1  in  the  case  of  help  (i.e.,  aid/assist): 

•  transport: 

Sense  1:  transport 
Sense  2:  transport,  carry 
Sense  3:  transport,  send,  ship 

•  help: 

Sense  1:  help,  assist,  aid 


4  Results 

Table  2  characterizes  the  nnmber  of  EVCA  classes  reqnired  for  coverage  of  478  HowNet  concepts. 
We  consider  the  approach  to  be  a  snccess  for  several  reasons:  (1)  Association  of  a  nniqne  EVCA 
class  to  a  HowNet  concept  was  achieved  in  371  cases — 77%  of  the  HowNet  classes;  (2)  Most  of 
the  other  cases  partitioned  the  HowNet  entries  into  2  EVCA  classes;  (3)  Only  2  cases  did  not 
correspond  to  any  EVCA  class  (i.e.,  every  word  associated  with  the  concept  belonged  to  a  different 
EVCA  class);  (4)  There  were  no  partitionings  exceeding  5  EVCA  classes. 

Examples  of  the  HowNet  partitionings  into  EVCA  classes  are  given  in  Table  3,  with  a  focns  on 
the  cases  where  1  partition  was  fonnd.  In  cases  where  there  is  more  than  1  partition,  percentages 
are  given  with  respect  to  the  nnmber  of  Chinese  verbs  in  each  HowNet  class. ^ 

5  Summary 

We  have  presented  an  approach  to  ahgning  two  large-scale  online  resonrces,  HowNet  and  EVCA. 
The  lexicon  resnlting  from  this  approach  is  large-scale,  containing  17284  Chinese- English  conceptnal 
links.  The  techniqne  for  prodncing  these  hnks  involves  matching  semantic-role  specihcations  in 
HowNet  with  those  in  EVCA.  Onr  resnlts  indicate  that  the  correspondence  is  very  high  between 
the  478  Chinese  HowNet  concepts  and  the  485  EVCA  classes.  Becanse  each  Chinese- Enghsh  link  is 
additionally  associated  with  a  WordNet  sense,  we  see  this  resonrce  as  the  hrst  step  toward  prodncing 
a  new  Asian  langnage  companion  to  ongoing  (Enro)WordNet  initiatives. 
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